Goto

Collaborating Authors

 County Antrim


Watch: BBC reporter tests AI anti-shoplifting tech

BBC News

Some major retailers and independent stores have introduced AI body scans, CCTV or facial recognition equipment to identify crimes like shoplifting.


Optimizing video analytics inference pipelines: a case study

Ghafouri, Saeid, Ding, Yuming, Chito, Katerine Diaz, del Rincón, Jesús Martinez, O'Connell, Niamh, Vandierendonck, Hans

arXiv.org Artificial Intelligence

Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generates substantial computational workloads. This paper presents a comprehensive case study on optimizing a poultry welfare monitoring system through system-level improvements across detection, tracking, clustering, and behavioral analysis modules. We introduce a set of optimizations, including multi-level parallelization, Optimizing code with substituting CPU code with GPU-accelerated code, vectorized clustering, and memory-efficient post-processing. Evaluated on real-world farm video footage, these changes deliver up to a 2x speedup across pipelines without compromising model accuracy. Our findings highlight practical strategies for building high-throughput, low-latency video inference systems that reduce infrastructure demands in agricultural and smart sensing deployments as well as other large-scale video analytics applications.


Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos, Marco Zaffalon

Neural Information Processing Systems

Parviainen et al. (2014) adopted an anytime integer linear programming (ILP) Otherwise it returns a sub-optimal DAG with bounded treewidth. Nie et al. (2014) proposed an efficient anytime ILP approach with a polynomial number of constraints Nie et al. (2015) proposed the method S2.


SLYKLatent: A Learning Framework for Gaze Estimation Using Deep Facial Feature Learning

Adebayo, Samuel, Dessing, Joost C., McLoone, Seán

arXiv.org Artificial Intelligence

In this research, we present SLYKLatent, a novel approach for enhancing gaze estimation by addressing appearance instability challenges in datasets due to aleatoric uncertainties, covariant shifts, and test domain generalization. SLYKLatent utilizes Self-Supervised Learning for initial training with facial expression datasets, followed by refinement with a patch-based tri-branch network and an inverse explained variance-weighted training loss function. Our evaluation on benchmark datasets achieves a 10.9% improvement on Gaze360, supersedes top MPIIFaceGaze results with 3.8%, and leads on a subset of ETH-XGaze by 11.6%, surpassing existing methods by significant margins. Adaptability tests on RAF-DB and Affectnet show 86.4% and 60.9% accuracies, respectively. Ablation studies confirm the effectiveness of SLYKLatent's novel components.


SLED: A Speculative LLM Decoding Framework for Efficient Edge Serving

Li, Xiangchen, Spatharakis, Dimitrios, Ghafouri, Saeid, Fan, Jiakun, Vandierendonck, Hans, John, Deepu, Ji, Bo, Nikolopoulos, Dimitrios

arXiv.org Artificial Intelligence

The growing gap between the increasing complexity of large language models (LLMs) and the limited computational budgets of edge devices poses a key challenge for efficient on-device inference, despite gradual improvements in hardware capabilities. Existing strategies, such as aggressive quantization, pruning, or remote inference, trade accuracy for efficiency or lead to substantial cost burdens. This position paper introduces a new framework that leverages speculative decoding, previously viewed primarily as a decoding acceleration technique for autoregressive generation of LLMs, as a promising approach specifically adapted for edge computing by orchestrating computation across heterogeneous devices. We propose \acronym, a framework that allows lightweight edge devices to draft multiple candidate tokens locally using diverse draft models, while a single, shared edge server verifies the tokens utilizing a more precise target model. To further increase the efficiency of verification, the edge server batch the diverse verification requests from devices. This approach supports device heterogeneity and reduces server-side memory footprint by sharing the same upstream target model across multiple devices. Our initial experiments with Jetson Orin Nano, Raspberry Pi 4B/5, and an edge server equipped with 4 Nvidia A100 GPUs indicate substantial benefits: 2.2 more system throughput, 2.8 more system capacity, and better cost efficiency, all without sacrificing model accuracy.


PrefixNLI: Detecting Factual Inconsistencies as Soon as They Arise

Harary, Sapir, Hirsch, Eran, Slobodkin, Aviv, Wan, David, Bansal, Mohit, Dagan, Ido

arXiv.org Artificial Intelligence

Natural Language Inference (NLI) models have been used in various ways to improve the factuality of LLM outputs. This is typically done by applying an NLI model to judge whether the model output is entailed from the supposed evidence, triggering some corrective actions, such as beam reranking at inference time or RL rewards during training. While NLI models are trained to detect factual inconsistencies over complete sentences, decisions in the common autoregressive generation architecture are made for each evolving text prefix, during decoding. Addressing this setting, we generalize the entailment detection task to apply over arbitrary text prefixes, and suggest its utility for improving generation faithfulness. Providing suitable evaluation and training datasets for this task, we train MiniTruePrefixes, a novel specialized model that better detects factual inconsistencies over text prefixes, outperforming comparable baseline NLI models by 5-14 F1 points in prefix-level entailment. We further demonstrate that integrating MiniTruePrefixes into a controlled decoding framework substantially improves factual consistency in abstractive summarization. When guided by MiniTruePrefixes, LLaMA-3.2-3B-Instruct matches the faithfulness and runtime of the 8B model from the same model family, while using only half the memory.


An Efficient Semantic Segmentation Decoder for In-Car or Distributed Applications

Nazir, Danish, Inti, Gowtham Sai, Bartels, Timo, Piewek, Jan, Bagdonat, Thorsten, Fingscheidt, Tim

arXiv.org Artificial Intelligence

Modern automotive systems leverage deep neural networks (DNNs) for semantic segmentation and operate in two key application areas: (1) In-car, where the DNN solely operates in the vehicle without strict constraints on the data rate. (2) Distributed, where one DNN part operates in the vehicle and the other part typically on a large-scale cloud platform with a particular constraint on transmission bitrate efficiency. Typically, both applications share an image and source encoder, while each uses distinct (joint) source and task decoders. Prior work utilized convolutional neural networks for joint source and task decoding but did not investigate transformer-based alternatives such as SegDeformer, which offer superior performance at the cost of higher computational complexity. In this work, we propose joint feature and task decoding for SegDeformer, thereby enabling lower computational complexity in both in-car and distributed applications, despite SegDeformer's computational demands. This improves scalability in the cloud while reducing in-car computational complexity. For the in-car application, we increased the frames per second (fps) by up to a factor of $11.7$ ($1.4$ fps to $16.5$ fps) on Cityscapes and by up to a factor of $3.5$ ($43.3$ fps to $154.3$ fps) on ADE20K, while being on-par w.r.t.\ the mean intersection over union (mIoU) of the transformer-based baseline that doesn't compress by a source codec. For the distributed application, we achieve state-of-the-art (SOTA) over a wide range of bitrates on the mIoU metric, while using only $0.14$\% ($0.04$\%) of cloud DNN parameters used in previous SOTA, reported on ADE20K (Cityscapes).


Learning Bayesian Networks with Thousands of Variables

Mauro Scanagatta, Cassio P. de Campos, Giorgio Corani, Marco Zaffalon

Neural Information Processing Systems

We present a method for learning Bayesian networks from data sets containing thousands of variables without the need for structure constraints. Our approach is made of two parts. The first is a novel algorithm that effectively explores the space of possible parent sets of a node. It guides the exploration towards the most promising parent sets on the basis of an approximated score function that is computed in constant time. The second part is an improvement of an existing ordering-based algorithm for structure optimization. The new algorithm provably achieves a higher score compared to its original formulation. Our novel approach consistently outperforms the state of the art on very large data sets.


REAL: Reading Out Transformer Activations for Precise Localization in Language Model Steering

Zhan, Li-Ming, Liu, Bo, Xie, Chengqiang, Cao, Jiannong, Wu, Xiao-Ming

arXiv.org Artificial Intelligence

Inference-time steering aims to alter a large language model's (LLM's) responses without changing its parameters, but a central challenge is identifying the internal modules that most strongly govern the target behavior. Existing approaches often rely on simplistic cues or ad hoc heuristics, leading to suboptimal or unintended effects. We introduce REAL, a framework for identifying behavior-relevant modules (attention heads or layers) in Transformer models. For each module, REAL trains a vector-quantized autoencoder (VQ-AE) on its hidden activations and uses a shared, learnable codebook to partition the latent space into behavior-relevant and behavior-irrelevant subspaces. REAL quantifies a module's behavioral relevance by how well its VQ-AE encodings discriminate behavior-aligned from behavior-violating responses via a binary classification metric; this score guides both module selection and steering strength. We evaluate REAL across eight LLMs from the Llama and Qwen families and nine datasets spanning truthfulness enhancement, open-domain QA under knowledge conflicts, and general alignment tasks. REAL enables more effective inference-time interventions, achieving an average relative improvement of 20% (up to 81.5%) over the ITI method on truthfulness steering. In addition, the modules selected by REAL exhibit strong zero-shot generalization in cross-domain truthfulness-steering scenarios.


What Is The Political Content in LLMs' Pre- and Post-Training Data?

Ceron, Tanise, Nikolaev, Dmitry, Stammbach, Dominik, Nozza, Debora

arXiv.org Artificial Intelligence

Large language models (LLMs) are known to generate politically biased text, yet how such biases arise remains unclear. A crucial step toward answering this question is the analysis of training data, whose political content remains largely underexplored in current LLM research. To address this gap, we present in this paper an analysis of the pre- and post-training corpora of OLMO2, the largest fully open-source model released together with its complete dataset. From these corpora, we draw large random samples, automatically annotate documents for political orientation, and analyze their source domains and content. We then assess how political content in the training data correlates with models' stance on specific policy issues. Our analysis shows that left-leaning documents predominate across datasets, with pre-training corpora containing significantly more politically engaged content than post-training data. We also find that left- and right-leaning documents frame similar topics through distinct values and sources of legitimacy. Finally, the predominant stance in the training data strongly correlates with models' political biases when evaluated on policy issues. These findings underscore the need to integrate political content analysis into future data curation pipelines as well as in-depth documentation of filtering strategies for transparency.